Compressive Feature Learning

نویسندگان

  • Hristo S. Paskov
  • Robert West
  • John C. Mitchell
  • Trevor J. Hastie
چکیده

This paper addresses the problem of unsupervised feature learning for text data. Our method is grounded in the principle of minimum description length and uses a dictionary-based compression scheme to extract a succinct feature set. Specifically, our method finds a set of word k-grams that minimizes the cost of reconstructing the text losslessly. We formulate document compression as a binary optimization task and show how to solve it approximately via a sequence of reweighted linear programs that are efficient to solve and parallelizable. As our method is unsupervised, features may be extracted once and subsequently used in a variety of tasks. We demonstrate the performance of these features over a range of scenarios including unsupervised exploratory analysis and supervised text categorization. Our compressed feature space is two orders of magnitude smaller than the full k-gram space and matches the text categorization accuracy achieved in the full feature space. This dimensionality reduction not only results in faster training times, but it can also help elucidate structure in unsupervised learning tasks and reduce the amount of training data necessary for supervised learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compressive Statistical Learning with Random Feature Moments

We describe a general framework –compressive statistical learning– for resource-efficient largescale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. A near-minimizer of the risk is computed from the sketch through the solution of a ...

متن کامل

Compressive Reinforcement Learning with Oblique Random Projections

Compressive sensing has been rapidly growing as a non-adaptive dimensionality reduction framework, wherein high-dimensional data is projected onto a randomly generated subspace. In this paper we explore a paradigm called compressive reinforcement learning, where approximately optimal policies are computed in a lowdimensional subspace generated from a high-dimensional feature space through rando...

متن کامل

An Efficient Algorithm for Large Scale Compressive Feature Learning

This paper focuses on large–scale unsupervised feature selection from text. We expand upon the recently proposed Compressive Feature Learning (CFL) framework, a method that uses dictionarybased compression to select a K-gram representation for a document corpus. We show that CFL is NP–Complete and provide a novel and efficient approximation algorithm based on a homotopy that transforms a convex...

متن کامل

An Overview of Compressive Trackers

Compressive tracking is considerably popular in the visual tracking community in recent years. The very strong theoretic support from compressive sensing motivates many researchers to follow and there are a wide range of compressive trackers with attractive performances. The goal of this paper is to overview some of the most recent state-of-theart compressive trackers in the literature. First, ...

متن کامل

Sparse Modeling of High - Dimensional Data for Learning and Vision

Sparse representations account for most or all of the information of a signal by a linear combination of a few elementary signals called atoms, and have increasingly become recognized as providing high performance for applications as diverse as noise reduction, compression, inpainting, compressive sensing, pattern classification, and blind source separation. In this dissertation, we learn the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013